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Abstract.  This  paper  describes  ongoing  research  into  the  role  of  optic-flow  de¬ 
rived  spatial  representations  and  their  relation  to  cognitive  computational  mod¬ 
els  of  mental  rotation  in  primates,  with  the  goal  of  producing  effective  and 
unique  autonomous  robot  navigational  capabilities.  A  theoretical  framework  is 
outlined  based  on  a  vectorial  interlingua  spanning  perception,  cognition  and 
motor  control.  Progress  to  date  on  its  implementation  within  an  autonomous  ro¬ 
bot  control  architecture  is  presented. 
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1  Introduction 

Envisioning,  as  defined  in  this  paper,  is  a  process  by  which  short-term  non-durable 
representations  are  created  from  optic  flow,  which  are  then  used  to  produce  a  vectori¬ 
al  navigational  control  signal  to  a  mobile  robot  based  on  an  analog  of  primates’  men¬ 
tal  rotation  capability.  Research  into  envisioning  can  lead  to  a  deeper  understanding 
of  the  processes  and  representation  by  which  mental  rotations  occur  in  the  primate 
brain,  establish  the  value  and  need  for  representation  during  map-free  navigation,  and 
potentially  provide  unique  navigational  capabilities  to  autonomous  robotic  systems. 

Mental  rotation  ability  has  been  observed  in  numerous  animals,  especially  pri¬ 
mates,  which  we  assert  affords  a  navigational  advantage  to  animal  [Aretz  and  Wick- 
ens  92]  and  ultimately  robot  alike.  "The  first  demonstration  of  mental  rotation  for 
visual  imagery  in  animals”  was  provided  by  [Vauclair  et  al  93],  although  the  mecha¬ 
nisms  by  which  this  occurs  remain  unclear,  especially  relative  to  human  performance. 
The  somewhat  commonplace  nature  of  this  capability  in  higher  animals  (humans 
[Shepard  73],  baboons  [Hopkins  et  al  93],  rhesus  monkeys  (mixed  results)  [Kohler  et 
al  05],  and  sea  lions  [Mauck  and  Dehnhardt  97,  Stich  et  al  03])  indicates  that  mental 
rotation  is  likely  to  serve  some  useful  evolutionary  function.  Some  animals  do  not 
seem  to  possess  the  same  mental  rotation  process  that  humans  do,  but  rather  use  other 
means  to  solve  similar  problems  (e.g.,  the  lion-tailed  macaque).  ’’The  question  by 
which  mode  of  information  processing  our  monkeys  solved  the  [high  angle  mental 
rotation]  task  ...  remains  unanswered”  [Burmann  et  al  05].  We  are  less  concerned, 
however,  with  justifying  the  underlying  biological  mechanisms  used  in  these  various 
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species  than  rather  establishing  a  pathway  for  its  analogous  and  bio-inspired  imple¬ 
mentation  in  autonomous  robots. 

The  principle  of  biological  economy  argues  that  mental  rotation  exists  in  nature 
due  to  its  conferring  some  utility  or  advantage  to  the  animal  [Arkin  98],  It  is  our  belief 
that  robots  can  only  benefit  by  having  a  similar  capability  at  their  disposal.  Accord¬ 
ing  to  this  empirical  principle,  every  evolutionary  development  has  its  reason  and 
serves  a  necessary  purpose  [Dennett  06].  That  includes  mental  rotations.  Our  research 
goal  is  to  understand  exactly  what  that  advantage  is  and  how  it  can  be  exploited  in 
intelligent  robots. 

It  is  posited  that  multiple  systems  are  in  place  for  navigational  tasks  undertaken  by 
an  intelligent  agent.  Evidence  exists  that  "object-based  [mental  rotation]  and  egocen¬ 
tric  spatial  transformations  [left-right]  rely  on  different  processing  systems”  [Ko¬ 
zhevnikov  et  al  06].  [Taylor  et  al  08]  further  distinguish  between  navigation,  which 
does  not  require  recruitment  of  spatial  representation,  and  wayfinding,  which  draws 
on  experienced-based  spatial  mental  models.  This  is  supported  by  [Kohler  et  al  05]: 
“In  summary,  our  results  support  the  idea  of  two  separately  evolved  information  pro¬ 
cessing  systems  -  mental  rotation  and  rotational  invariance  [in  baboons]”.  Restating, 
the  question  confronting  us  then  is  what  role  does  mental  rotation  play  in  navigation, 
if  any,  and  under  what  circumstances  can  it  prove  useful  in  support  of  robotic  naviga¬ 
tion.  It  is  not  suggested  that  mental  rotation  capability  alone  is  adequate  for  intelligent 
navigation,  but  rather  that  it  serves  a  particular  niche  in  that  process. 

Mental  rotations  are  guided  by  motor  processes  at  least  in  part,  even  for  abstract 
objects.  "Mental  rotation  is  a  covert  simulation  of  motor  rotation"  [Wexler  et  al  98], 
i.e.,  the  action  is  planned  but  not  executed  -  a  form  of  envisioning.  Their  hypothesis  is 
that  "Visuomotor  anticipation  is  the  engine  that  drives  mental  rotation",  which  is 
supported  by  multiple  experimental  studies.  [Georgopoulos  et  al  86]  further  hypothe¬ 
size  that  a  subject  may  solve  this  problem  by  a  mental  rotation  of  an  imagined  move¬ 
ment  vector  from  the  direction  of  the  stimulus  to  the  direction  of  the  actual  move¬ 
ment.  In  their  studies  of  monkey  motor  cortex  they  observe  that  the  neuronal  popula¬ 
tion  vector  is  a  weighted  sum  of  contributions  (votes)  of  directionally  tuned  neurons; 
each  neuron  is  assumed  to  vote  in  its  own  preferred  direction  with  a  strength  that  de¬ 
pends  on  how  much  the  activity  of  the  neuron  changes  for  the  movement  under  con¬ 
sideration.  Their  proposed  account  “involve(s)  the  creation  and  mental  rotation  of  an 
imagined  [or  rather  envisioned]  movement  vector  from  the  direction  of  the  light  to  the 
direction  of  the  movement.  . . .  The  results  provide  direct,  neural  evidence  for  the  men¬ 
tal  rotation  hypothesis.”  Kinesthetic  representations  are  also  believed  to  play  a  role, 
especially  for  the  congenitally  blind  (Paivio  97),  but  they  will  not  be  considered  in  our 
robotic  navigation  application. 

Others  entertain  an  alternate  representational  account  using  piecemeal  proposition¬ 
al  models  involving  symbols  resembling  language  instead  of  visual  analog  models  as 
a  means  to  account  for  mental  rotations  (Pylyshyn  73,  Anderson  78,  Yuille  and  Stei¬ 
ger  83),  although  doubt  has  been  cast  upon  their  general  validity  [Paivio  90]. 
[Khooshabeh  09]  argues  that  “mental  rotations  most  likely  involves  analogue  pro¬ 
cesses”,  but  it  remains  unclear  whether  whole  or  piecemeal  component-by-component 
rotations  occurs  in  all  humans.  This  may  be  explained  by  individual  differences  be- 


tween  those  human  subjects  possessing  varying  degrees  of  spatial  ability.  For  our 
purposes  of  robotic  control,  we  will  explore  the  visual  analog  representational  view. 

In  the  Georgia  Tech  Mobile  Robot  Laboratory,  this  three-year  project  sponsored 
by  the  Office  of  Naval  Research  entitled  Primate-inspired  ground  vehicle  control  that 
recently  commenced  has  three  major  objectives: 

1.  To  understand,  create,  and  apply  methods  and  models  by  which  primates  cogni¬ 
tively  manage  and  manipulate  spatial  information  in  their  world; 

2.  To  develop  efficient  robust  perceptual  techniques  capable  of  exploiting  and  popu¬ 
lating  these  models;  and 

3.  To  integrate  and  test  these  ideas  within  a  proven  navigational  framework  capable 
of  both  deliberative  and  behavioral  autonomous  robot  control. 

This  paper  largely  focuses  on  (1)  and  its  role  in  robotic  navigational  control.  In 
particular,  the  role  of  mental  rotations  acting  on  transient  spatial  representations  de¬ 
rived  from  optic  flow  serves  as  our  primary  approach. 

Bio-inspired  methods  serve  as  the  starting  point  for  this  research.  Understanding 
primate  navigation  systems  requires  understanding  how  primates  solve  spatial  cogni¬ 
tion  problems.  This  involves  both  spatial  memory,  and  more  critically  manipulation 
of  spatial  information.  How  spatial  memory  is  organized  is  a  technically  soluble 
problem  that  builds  on  a  substantial  body  of  work  already  accomplished  in  robotics 
(e.g.,  [Kuipers  08]).  The  manipulation  of  spatial  information  and  the  processes  neces¬ 
sary  to  create  and  support  a  navigation  system  are  not  as  well  understood.  Studies  in 
humans  demonstrate  the  importance  of  geometry  in  spatial  manipulations  that  humans 
augment  with  external  maps  [Landau  and  Lakusta  09].  Nonhuman  primates  do  not 
have  the  map  option  for  deriving  their  spatial  information  necessary  for  navigation. 
In  solving  spatial  problems  that  involve  recognizing  the  same  shape  in  a  different 
orientation,  two  different  strategies  appear  to  be  employed  by  preverbal  infants  and 
likely  by  nonhuman  primates.  The  first  is  a  mental  mapping  approach  in  which  a 
model  is  mentally  rotated  to  match  one  of  several  potential  target  configurations.  The 
second  approach  is  a  feature-based  system,  in  which  geometric  features  are  extracted 
from  the  model  and  converted  to  vectors  allowing  matching  with  targets  without  re¬ 
sorting  to  mental  mapping  and  rotation  [Lourenco  and  Huttenlocher  07].  While  this 
second  approach  is  less  likely  to  be  subject  to  the  time  delays  that  are  seen  in  the  men¬ 
tal  manipulation  approach,  both  avenues  are  considered,  with  initial  focus  on  mental 
rotations. 


2  Transient  Visual  Representation  via  Optic  Flow 

Many  low-level  perception  and  navigation  tasks  are  based  on  wide  field-of-view, 
peripheral,  optical  flow  computation.  [Gibson  50,  79],  as  paraphrased  by  Duffy  and 
Wurtz  [95],  remarked  that: 

While  moving  through  the  environment,  the  visual  world  streams  around  observers 
in  a  pattern  which  reflects  their  motion.  These  optic  flow  fields  combine  the  effects 
of  all  observer  movements  in  three-dimensional  space  to  provide  visual  infor- 


motion  that  can  guide  self-motion,  stabilize  posture,  and  reveal  the  structure  of  the 

environment. 

Optical  flow  encodes  both  instantaneous  platform  motion  and  scene  structure,  and 
it  has  long  been  known  that  flies  and  other  insects  use  optical  flow  to  enable  agile 
maneuvering  in  flight,  robust  landing  behaviors,  etc.  [Srinivasan  et  al.,  09;  Wehner  et 
al.,  96;  Heisenberg  and  Wolf,  93].  In  primate  and  human  vision,  optical  flow  is  used 
for  estimating  heading  [Lappe  et  al.,  99]  and  can  be  used  in  walking  [Warren  et  al., 
01],  driving  [Land,  01],  and  updating  the  egocentric  location  of  objects  in  the  short¬ 
term  [Wolbers  et  al.,  08],  Computational  models  on  how  flying  insects  use  optic  flow 
have  been  demonstrated  in  guiding  small  aerial  and  ground  vehicles  [Franceschini  et 
al.,  92],  and  continue  to  inspire,  e.g.,  witness  the  recent  work  by  Srinivasan  et  al.  [09]. 

Ego-motion  and  heading  can  be  recovered  robustly  and  efficiently  from  optical 
flow.  The  advantages  of  using  optical  flow  increase  when  tightly  integrated  with  ste¬ 
reo,  which  we  exploited  to  create  very  efficient  visual  odometry  algorithms  [Ni  and 
Dellaert  06];  [Kaess  et  al.  09].  Optical  flow  can  be  computed  very  efficiently  in  paral¬ 
lel,  even  at  the  level  of  the  sensor,  where  commercial  optical  flow  sensors  are  already 
available:  at  each  pixel  location  in  an  image  the  local  flow  can  be  computed  using 
local  spatio-temporal  convolution  kernels  that  implement  the  same  computation  as  in 
biological  systems. 

For  this  project,  optic  flow  methods  are  being  developed  to  immediately  recover  a 
3D  snapshot  of  the  immediate  environment  in  front  of  the  vehicle.  While  optical  flow 
fields  are  high-dimensional  measurements  that  seemingly  require  a  lot  of  computa¬ 
tional  effort  to  understand  and  analyze,  they  can  in  fact  be  decomposed  into  much 
simpler  “basis  flows”  that  make  it  easy  to  extract  the  relevant  information  from  them, 
as  shown  in  our  earlier  work  [Roberts  et  al.  09].  The  flow  field  due  to  rotation  does 
not  depend  on  the  3D  structure  of  the  scene  and  is  a  simple  linear  combination  of 
three  rotational  basis  flows.  This  property  can  be  used  to  (a)  robustly  estimate  rotation 
or,  (b)  when  rotation  is  known  from  an  IMU,  immediately  remove  its  contribution 
from  the  flow  fields.  After  removing  the  rotational  flow  field  contribution,  what  is  left 
is  the  flow  contribution  that  depends  on  3D  scene  structure.  These  snapshots  will 
prove  central  to  the  envisioning  component  described  next. 

3  Envisioning  using  Mental  Rotations:  An  Overview 

Tying  the  perceptual  processing  based  on  optic  flow  to  computationally  parsimonious 
robotic  control  requires  considerable  insight  and  effort.  It  is  our  contention  that  exist¬ 
ing  primate  studies,  both  cognitive  (e.g.,  [Landau  and  Lakusta  09,  Lourenco  and  Hut- 
tenlocher  07])  and  neuroscientific  (e.g.,  [Duffy  98,  Kavcic  and  Duffy  03]),  can  pro¬ 
vide  the  mechanisms  to  accomplish  this. 

Our  research  group  has  an  extensive  history  in  the  importation  of  biological  mod¬ 
els  into  the  control  of  robotic  systems,  ranging  from  the  sowbug  [Endo  and  Arkin  01], 
praying  mantis  [Arkin  et  al  00],  birds  [Duncan  et  al  09],  wolves  [Madden  et  al  10], 
dogs  [Arkin  et  al  03]  and  humans  [Moshkina  et  al  11, Wagner  and  Arkin  09]  among 
others. 


Robotic  navigation  is  often  conducted  by  first  constructing  a  map  and  then  plotting 
a  course  through  it.  Alternatively,  purely  reactive  methods  use  an  egocentric  frame 
and  respond  directly  in  a  behavioral  manner  to  incoming  sensory  information.  In  a 
novel  paradigm,  early  work  by  Stein  [Stein  94],  considered  imagining  as  a  basis  for 
navigation,  simulating  in  advance  what  the  robot  should  do  before  doing  it.  This  ca¬ 
pacity  for  simulation  (imaging  future  actions)  provides  potentially  useful  feedback  as 
to  the  utility  and  relevance  of  any  plans  under  consideration.  The  quality  of  feedback 
is  directly  related  to  the  quality  of  the  simulation  itself  and  the  accuracy  of  its  under¬ 
lying  assumptions  about  the  world.  Stein’s  work  injected  a  navigational  simulator 
directly  into  the  behavioral  controller,  and  was  only  loosely  inspired  by  cognitive 
considerations.  It  also  considered  plans  as  the  basis  for  imagination,  an  outcome  of 
deliberative  reasoning. 

Ecological  psychology,  as  advocated  by  J.J.  Gibson  [Gibson  79],  demanded  a  deep 
understanding  of  the  environment  in  which  the  organism  was  situated,  and  how  evolu¬ 
tion  affected  its  development.  The  notion  of  affordances  provides  a  means  for  ex¬ 
plaining  the  basis  of  perception's  roots  in  behavior.  This  psychological  theory  said 
that  things  are  perceived  in  terms  of  the  opportunities  they  afford  an  agent  to  act.  All 
actions  are  a  direct  consequence  of  sensory  pick-up.  This  results  from  the  tuning  by 
evolution  of  an  organism  situated  in  the  world  to  its  available  stimuli.  Significant 
assertions  include  [Gibson  79]: 

•  The  environment  is  what  organisms  perceive.  The  physical  world  differs  from  the 

environment,  i.e.,  it  is  more  than  the  world  described  by  physics; 

•  The  observer  and  the  environment  complement  each  other;  and 

•  Information  is  inherent  in  the  ambient  light  and  is  picked  up  by  the  agent's  optic 

array. 

This  leads  us  to  the  value  of  nearly  instantaneous  parsimonious  representations  de¬ 
rived  from  optic  flow  and  managed  by  a  mechanism  mapping  perception  to  control 
involving  minimal  cognitive  effort  (i.e.,  no  deliberative  navigational  planning  or  reli¬ 
ance  on  longer-term  plans  as  was  the  case  in  Stein’s  earlier  imagining  work). 

In  our  research  we  employ  a  more  cognitively  faithful  and  direct  mechanism  than 
Stein’s  approach,  which  we  refer  to  as  envisioning,  rather  than  imagining,  due  to  its 
immediacy  and  short-term  projections.  By  using  models  inspired  by  primate  mental 
rotation  experiments  (e.g.,  [Vauclair  et  al  93,  Hopkins  et  al  93,  Kohler  et  al  05]]  and 
the  snapshot  spatial  models  produced  from  the  immersive  optical  flow  work  described 
earlier,  this  can  allow  a  robot  to  envision  rotating  a  perceived  spatial  layout  relative  to 
a  goal  state  in  a  manner  consistent  with  the  primate  process,  to  provide  navigational 
guidance.  This  does  not  create  a  route  per  se,  but  rather  an  iterative,  semi-reactive 
approach  for  direction-finding  towards  a  particular  goal  location.  This  guidance  is 
frequently  updated  as  the  incoming  perceptual  layout  unfolds  over  space  and  time. 
Figure  1  illustrates  this  flow  through  the  sensory  to  cognitive  to  motor  spaces. 


Optic  Flow 


Motor  Control 


Mental  Rotation 


Fig.  1.  Optic  flow  produces  snapshots  of  the  spatial  layout  of  the  world,  which  are  then  com¬ 
pared  to  a  known  goal  state,  resulting  in  a  translation  and  rotational  control  signal  that  is  re¬ 
peated  continuously  until  the  final  goal  state  is  achieved. 

Using  mental  rotations  as  the  basis  for  navigation  is  believed  a  relatively,  if  not  com¬ 
pletely,  unexplored  area  for  robotic  navigation  to  date,  which  can  draw  on  the  wisdom 
of  evolutionary  capabilities  and  the  power  of  short-term  optic  flow  representations. 
We  also  believe  the  results  of  this  work  are  extensible  for  more  complex  navigational 
problems  that  involve  higher  numbers  of  degrees  of  freedom  for  the  platform,  up  to 
and  including  mobile  manipulation  tasks.  Envisioning  is  thus  neither  reactive  or  de¬ 
liberative  in  the  traditional  sense,  but  rather  involves  semi-reactive  forces  acting  upon 
a  mental  model  created  from  iterative  optic  flow  representations,  yielding  rotations 
and  translations  in  the  model  that  correspond  to  real  world  navigation. 

We  then  address  the  mechanisms  by  which  mental  rotations  occur  in  primates  to 
the  extent  to  which  they  are  known  to  generate  a  suitable  computational  formalism 
and  model.  This  is  being  integrated  and  tested  in  simulation  first  and  then  subsequent¬ 
ly  on  robotic  platforms  in  both  indoor  and  outdoor  environments.  These  models  are 
being  refined  based  on  the  results  of  these  experiments  with  the  goals  of  both  produc¬ 
ing  robust  computationally  inexpensive  navigational  control,  as  well  as  insights  into 
the  biological  processing  of  information  by  primates.  While  the  main  focus  is  on  non¬ 
human  primates,  mental  rotation  capacity  of  humans  is  considered  where  appropriate 
(e.g.,  [Shepard  and  Cooper  82],  [Yule  06],  Anorim  et  al  06]). 

Multiple  aspects  of  the  cognitive  and  neural  sides  of  mental  rotation  inform  our 
research.  Evidence  exists  that  transformations  of  mental  images  are  guided  by  motor 
processes  [Wexler  et  al  98].  We  also  consider  the  dual,  where  mental  rotations  inform 
motor  processes.  Mental  rotation  in  humans  is  used  for  discrimination  of  left  and  right 
turns  in  maps  [Shepard  and  Hurwitz  84].  Investigations  into  mental  rotation  represen- 


tations  [Khooshabeh  and  Hegarty  10]  and  the  role  of  visual  working  memory  [Hyun 
and  Luck  07]  have  also  been  performed.  Neural  studies  [Georgopoulos  and  Pellizzer 
95]  provide  supporting  evidence  for  the  presence  of  underlying  vectorial  representa¬ 
tions  in  mental  rotations. 


4  Vectorial  Interlingua 

One  of  the  common  challenges  that  interdisciplinary  research  presents  is  establishing 
a  common  representational  framework  for  both  discourse  and  modeling.  Fortunately 
we  have  significant  experience  with  this  problem  and  have  developed  methods  to 
address  it.  Our  earlier  work  [Arkin  et  al  00,  Arkin  et  al  98,  Weitzenfeld  et  al  98]  used 
schema  theory  as  a  common  language  to  tie  together  biological  modeling  of  a  praying 
mantis  and  amphibians  with  robotic  control. 

We  are  moving  towards  a  more  fundamental  mathematical  structure  that  already 
exists  and  is  widely  accepted  in  the  biological,  computer  vision,  and  robotics  commu¬ 
nities,  namely  vectorial  representations,  to  cover  the  spanning  sensorimotor  and  cog¬ 
nitive  components  of  our  research. 

•  For  robotic  control,  potential  field  or  vectorial  representations  are  often  used  for 
navigational  purposes  in  2D  or  3D  worlds  [Khatib  85,  Arkin  87].  It  has  also  been 
further  developed  through  the  use  of  dynamical  system  models  [Schoner  and 
Dose  92].  There  is  great  value  not  only  in  the  mathematical  framework  for  ex¬ 
pressing  action,  but  also  the  composability  of  behaviors  when  using  these  formal¬ 
isms,  often  expressed  in  motor  schema  theory  [Arkin  89]. 

•  Optical  flow  (cf.  Section  2),  by  its  very  nature  results  in  vector  spaces  in  egocen¬ 
tric  imagery. 

•  Biologists  have  recognized  within  the  central  nervous  system  (CNS),  vectorial 
mappings  in  the  spinal  cord  that  translate  perceptual  stimuli  onto  motor  com¬ 
mands  [Bizzi  et  al  91,  Georgopoulos  86],  some  of  which  have  been  applied  al¬ 
ready  to  robotic  control  [Giszter  et  al  00].  Neurobiology  often  argues  for  the  hy¬ 
pothesis  of  a  vectorial  basis  for  motor  control,  something  that  can  be  readily 
translated  into  robotic  control  systems.  Research  at  MIT  [Bizzi  et  al  91]  has 
shown  that  a  neural  encoding  of  potential  limb  motion  encompassing  direction, 
amplitude,  and  velocity  exists  within  the  spinal  cord  of  the  deafferented  frog.  Mi¬ 
crostimulation  of  different  regions  of  the  spinal  cord  generates  specific  force 
fields  directing  the  forelimb  to  specific  locations.  These  convergent  force  fields 
move  the  limb  towards  an  equilibrium  point  specified  by  the  region  stimulated. 
The  limb  itself  can  be  considered  as  a  set  of  tunable  springs  as  it  moves  towards 
its  rest  position  (equilibrium  point).  Thus  the  planning  aspects  of  the  CNS  trans¬ 
late  into  establishing  the  equilibrium  points  which  implicitly  specify  a  desired 
motion.  Of  particular  interest  is  the  observation  that  multiple  stimulations  give 
rise  to  new  spatial  equilibrium  points  generated  by  simple  vector  addition.  Exper¬ 
iments  in  humans  [Shadmehr  and  Mussa-Ivaldi  94],  have  been  shown  to  be  con¬ 
sistent  with  this  force-field  model  when  applied  to  reaching  tasks. 


It  is  our  intention  in  this  work  to  exploit  this  representational  commonality  for  map¬ 
ping  incoming  optic  flow  field  information,  via  a  processing  structure  consistent  with 
CNS  representations,  to  a  vectorial  motor  control  manifold  for  completely  represent¬ 
ing  the  entire  end-to-end  sensorimotor  pathway  for  flow-field  navigation  in  primates 
and  robots.  This  spanning  representational  structure  forms  a  contribution  in  its  own 
right,  but  will  also  provide  the  formal  methods  for  implementing  the  envisioning  ap¬ 
proach  described  earlier. 

Earlier  research  has  used  neuronal  population  encoding  to  achieve  an  internal  rep¬ 
resentation  that  could  then  be  used  to  give  control  commands  to  the  motor  layer 
[Georgopoulos,  Schwartz  and  Kettncr,  1986].  This  approach  used  a  large  population 
vector  which  would  then  indicate  the  movement  vector  direction.  The  population 
vector  is  large,  since  there  is  an  entry  for  each  neuron,  and  this  makes  internal  pro¬ 
cessing  inefficient.  Therefore,  we  adopt  a  different  more  efficient  strategy. 

In  order  to  model  the  navigation  system  faithfully  after  primate  navigation  princi¬ 
ples,  the  concept  of  mental  rotation  is  applied.  Mental  rotation  refers  to  the  rotational 
transformation  of  an  object’s  visual  mental  image  [Takano  and  Okubo  2002].  Re¬ 
search  has  indicated  that  several  primates  might  perform  mental  rotation  in  their  visu¬ 
al  processing  system,  at  least  while  distinguishing  mirror  images  [Hopkins  et  al  1992; 
Mauck  and  Dehnhardt  1997;  Burmann  et  al  2005].  In  some  primates,  it  was  observed 
that  the  response  time  of  a  mental  rotation  function  is  related  to  the  angular  disparity 
and  the  graph  could  be  plotted  from  0  degrees  to  360  degrees  as  an  inverted  V  with 
non-monotonicities  at  180  degrees.  While  there  has  been  no  conclusive  evidence  to 
date  that  mental  rotations  are  performed  by  all  primates  in  their  visual  processing 
tasks,  there  are  indications  that  at  least  a  few  of  them  successfully  employ  mental 
rotation  techniques  to  identify  objects.  It  has  been  observed  that  familiarity  increases 
the  speed  of  response  while  doing  a  mental  rotation  [Burmann  et  al  2005]  while  com¬ 
plexity  brings  about  an  increase  in  response  time  [Hopkins  et  al  1992]. 

A  secondary  motivation  in  choosing  mental  rotations  to  represent  the  internal  pro¬ 
cessing  system  is  that  prior  research  has  proposed  a  relationship  between  motor  pro¬ 
cesses  and  mental  rotations  [Wexler  et  al  1998].  There  is  evidence  to  indicate  that  the 
mental  rotation  in  the  same  direction  as  manual  rotation  is  faster  than  the  case  where 
the  direction  is  opposite  to  manual  rotation.  Also,  the  speed  of  motor  rotation  has  an 
influence  on  the  speed  of  mental  rotation. 


5  Architectural  Integration 

In  order  to  conduct  the  robotic  experiments  required  to  both  verify  the  primate  models 
generated  and  to  tie  in  the  perceptual  processing  described  earlier,  these  computation¬ 
al  methods  are  being  integrated  into  our  existing  architectural  framework,  Mis- 
sionLab1 ,  [MacKenzie  et  al  97,  Endo  et  al  04]  for  prototypical  navigational  tasks 
(Figure  2)  such  as  interior  building  operations  or  outdoor  environments. 


i 


MissionLab,  now  in  version  7.0,  is  freely  available  for  research  and  education  at 
http://www.cc.gatech.edu/ai/robot-lab/research/MissionLab/. 


Fig.  2.  MissionLab  mission  specification  system 
simulation  environment. 


(Left)  In  the  field  deployment,  (Right)  3D 


[Kosslyn  et  al  94,  Hyun  and  Luck  07]  view  mental  rotation  as  a  visuospatial  pro¬ 
cess  in  which  mental  images  utilize  a  visual  buffer  for  transformation.  [Khooshabeh 
and  Hegarty  10]  assumes  that  the  representation  is  a  3D  mental  image  that  includes 
metric  information  about  the  shapes  of  the  objects  (lengths  and  angles).  Our  imple¬ 
mentation  approach  maintains  short-term  visual  buffers  for  storing  the  optical  flow 
depth  maps  and  uses  metric  spatial  information  regarding  the  world  as  the  basis  for 
producing  the  envisioned  mental  rotation  vectors  that  ultimately  are  transformed  into 
robotic  control  vectors. 

[Johnson  90]  suggested  mental  rotations  involved  a  sequential  process: 

1)  Form  (encode)  a  mental  representation  of  an  object 

2)  Rotate  the  object  mentally  until  axial  orientation  allows  the  comparison  to 
the  standard 

3)  Make  the  comparison 

4)  Make  a  judgment 

5)  Report  a  decision 

This  basic  paradigm  is  consistent  with  our  approach  as  well,  but  it  is  now  iterative 
providing  continuous  feedback  to  the  navigational  system,  and  where  step  (5)’s  report 
now  consists  of  a  control  vector  directed  to  the  robot’s  behavioral  controller.  [Aretz 
and  Wickens  92]  also  offer  a  process  model  that  provides  guidance  to  linking  our 
optic-flow  snapshot  representations  to  the  underlying  cognitive  operations  involving 
mental  rotation  involved  in  aligning  perceptual  encodings  of  egocentric  and  world 
representations. 

The  current  approach  is  summarized  as  follows:  Consider  the  robot’s  navigational 
goal  is  to  be  reached  via  a  series  of  sub-goal  waypoints,  represented  by  depth  maps 
derived  from  optical  flow  snapshots.  A  depth  map  is  an  image  that  contains  infor¬ 
mation  relating  to  the  distances  of  the  surfaces  of  scene  objects  from  a  viewpoint.  In 
our  project,  using  a  depth  map  enables  us  to  do  pixel-based  matching  to  find  the  cor¬ 
relation  of  current  state  with  the  saved  goal  state. 

The  robot  then  moves  in  the  direction  of  one  of  these  sub-goals  as  a  result  of  the 
outcome  of  the  mental  rotation  process  comparing  the  existing  optic  flow  snapshot 
with  the  current  subgoal’s  snapshot.  As  the  robot  (Figure  3)  moves  closer  to  the  sub¬ 
goal,  at  each  step  it  compares  the  depth  map  generated  from  the  optic  flow  snapshot 


images  (Figure  4)  it  receives  from  its  cameras  to  its  internal  representation  of  the  sub¬ 
goal  which  is  stored  in  its  working  memory  [Hyun  and  Luck,  2007].  This  comparison 
is  currently  done  via  correlation,  which  is  (speculatively)  perhaps  how  mental  rota¬ 
tions  are  done  in  the  visual  processing  systems  of  primates.  Doing  this  comparison  at 
each  step  helps  the  robot  to  correct  its  course  in  a  semi-reactive  manner  -  responding 
directly  to  incoming  sensory  information  using  a  fleeting  transitory  representation 
derived  from  optic  flow. 


Fig.  3.  Pioneer  robot  used  in  experiments 


Fig.  4.  Example  optical  flow  snapshot  generated  by  robot  translational  motion 


Depth  map 

Depth  Maps 


Fig.  5.  Depth  map  generation  from  optical  flow 

The  current  technique  adopted  to  obtain  the  depth-image  involves  2-D  pixel-based 
correlation  matching  (Figure  5).  There  are  many  algorithms  that  compute  the  depth 
images  using  this  technique  [Scharstein  and  Szeliski  02],  Correlation-based  matching 
produces  dense  depth  maps  by  calculating  the  disparity  at  each  pixel  within  a  neigh¬ 
borhood.  This  is  achieved  by  taking  a  square  window  of  certain  size  around  the  pixel 
of  interest  in  the  reference  image  and  finding  the  homologous  pixel  within  the  win¬ 
dow  in  the  target  image,  while  moving  along  the  corresponding  scanline.  The  goal  is 
to  find  the  corresponding  (correlated)  pixel  within  a  certain  disparity  range  that  mini¬ 
mizes  the  associated  error  and  maximizes  the  similarity. 

At  each  step  in  the  navigation  towards  a  sub-goal,  the  internal  processing  system 
does  a  comparison  of  the  optic  flow  sensory  input  to  the  sub-goal’s  depth  map  stored 
in  the  working  memory  and  sends  control  commands  to  the  motor  layer  to  move  it 
towards  that  location.  Once  the  robot  reaches  a  sub-goal,  it  takes  into  account  the  set 
of  sub-goals  visible  from  its  current  position  and  chooses  the  next  one  attainable  on 
its  path  towards  the  overall  goal.  The  procedure  is  repeated  until  it  reaches  the  final 
goal.  The  robot  recognizes  that  it  has  reached  its  destination  since  that  visual  state  is 
stored  in  its  memory  as  is  the  case  with  the  sub-goals. 

For  the  robotic  control  vector  component,  potential  field  analogs  and  vectorial 
representations  have  been  used  earlier  in  navigation  through  2D  and  3D  worlds  [Ar- 
kin  1989,  Arkin  92].  Dynamical  system  models  have  also  been  developed  (Schoner 
and  Dose,  1992).  Since  we  use  the  concept  of  internal  mental  rotations,  we  express 
the  control  commands  in  terms  of  force  fields,  a  concept  that  was  suggested  to  be 
behind  the  control  commands  of  the  Central  Nervous  System  in  living  beings  (Bizzi  et 
al,  1991).  The  robot  moves  because  of  the  forces  acting  upon  it.  These  forces  are  due 
to  the  control  commands  sent  by  the  behavioral  control  system  as  it  seeks  to  correct 
the  navigation  path  after  doing  the  optical  flow  snapshot  comparisons  at  each  step. 


6  Conclusions 


This  paper  has  presented  the  motivation  and  outline  of  an  autonomous  robotic  con¬ 
trol  system  that  integrates  the  cognitive  paradigm  of  mental  rotation  as  an  alternate 
strategy  for  intelligent  navigation  from  more  conventional  methods.  It  incorporates 
insights  gleaned  from  studies  in  primates,  snapshot-derived  optic  flow  visual  imagery, 
a  spanning  vectorial  mathematical  model,  and  a  software  robot  architectural  imple¬ 
mentation.  The  goal  is  to  understand  where  and  how  the  cognitive  processes  of  mental 
rotation  can  provide  new  capabilities  to  intelligent  systems  moving  in  the  open  world. 

In  the  future  we  will  not  only  expand  upon  these  preliminary  results  but  also  con¬ 
sider:  (1)  the  application  of  perspective  taking  [Kozhevnikov  et  al  06],  “where  the 
viewer  attempts  to  imagine  the  scene  from  a  position  other  than  his  or  her  actual 
viewpoint’’  [Keehner  et  al  06];  and  (2)  rotational  invariance,  a  time-independent  non¬ 
analogue  visuo-spatial  system  commonly  found  in  many  animals  (e.g.,  pigeons  [Hol- 
lard  and  Delius  82]),  including  primates  [Burmann  et  al  05,  Kohler  et  al  05].  This  may 
enable  us  to  gain  a  better  understanding  of  the  appropriate  role  of  mental  rotation  in 
robot  navigation  particularly  in  the  context  of  multiple  competing/cooperating  object- 
recognition  and  navigational  systems. 
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